Low-Complexity Near-ML Decoding of Large Non- Orthogonal STBCs 

using Reactive Tabu Search 

N. Srinidhi, Saif K. Mohammed, A. Chockalingam, and B. Sundar Rajan 
Department of ECE, Indian Institute of Science, Bangalore 560012, INDIA 



Abstract — Non-orthogonal space-time block codes (STBC) with 
large dimensions are attractive because they can simultaneously 
achieve both high spectral efficiencies (same spectral efficiency 
as in V-BLAST for a given number of transmit antennas) as well 
as full transmit diversity. Decoding of non-orthogonal STBCs 
with large dimensions has been a challenge. In this paper, we 
present a reactive tabu search (RTS) based algorithm for decod- 
ing non-orthogonal STBCs from cyclic division algebras (CDA) 
having large dimensions. Under i.i.d fading and perfect channel 
state information at the receiver (CSIR), our simulation results 
show that RTS based decoding of 12 x 12 STBC from CDA and 
4-QAM with 288 real dimensions achieves i) 10 -3 uncoded BER 
at an SNR of just 0.5 dB away from SISO AWGN performance, 
and ii) a coded BER performance close to within about 5 dB of 
the theoretical MIMO capacity, using rate-3/4 turbo code at a 
spectral efficiency of 18 bps/Hz. RTS is shown to achieve near 
SISO AWGN performance with less number of dimensions than 
with LAS algorithm (which we reported recently) at some extra 
complexity than LAS. We also report good BER performance of 
RTS when i.i.d fading and perfect CSIR assumptions are relaxed 
by considering a spatially correlated MIMO channel model, and 
by using a training based iterative RTS decoding/channel esti- 
mation scheme. 

Keywords — Non-orthogonal STBCs, large dimensions, low-comp- 
lexity near-ML decoding, tabu search, high spectral efficiencies. 

I. Introduction 
MIMO systems that employ non-orthogonal space-time block 
codes (STBC) from cyclic division algebras (CDA) for ar- 
bitrary number of transmit antennas, N t , are attractive be- 
cause they can simultaneously provide both full-rate (i.e., Nt 
complex symbols per channel use, which is same as in V- 
BLAST) as well as full transmit diversity IBM- The 2 x 2 
Golden code is a well known non-orthogonal STBC from 
CDA for 2 transmit antennas |3j. High spectral efficiencies 
of the order of tens of bps/Hz can be achieved using large 
non-orthogonal STBCs. For e.g., a 16 x 16 STBC from CDA 
has 256 complex symbols in it with 512 real dimensions; with 
16-QAM and rate-3/4 turbo code, this system offers a high 
spectral efficiency of 48 bps/Hz. Decoding of non-orthogonal 
STBCs with such large dimensions, however, has been a chal- 
lenge. Sphere decoder and its low-complexity variants are 
prohibitively complex for decoding such STBCs with hun- 
dreds of dimensions. Recently, we proposed a low-complexity 
near-ML achieving algorithm to decode large non-orthogonal 
STBCs from CDA; this algorithm, which is based on bit- 
flipping approach, is termed as likelihood ascent search (LAS) 
algorithm J4)-||6]]. In this paper, we present a reactive tabu 
search (RTS) based approach to near-ML decoding of non- 
orthogonal STBCs with large dimensions. 

Key attractive features of the proposed RTS based decod- 
ing are its low-complexity and near-ML performance in sys- 
tems with large dimensions (e.g., hundreds of dimensions). 



While creating hundreds of dimensions in space alone (e.g., 
V-BLAST) requires hundreds of antennas, use of non-ortho- 
gonal STBCs from CDA can create hundreds of dimensions 
with just tens of antennas (space) and tens of channel uses 
(time). Given that 802.1 1 smart WiFi products with 12 trans- 
mit antenna^] at 2.5 GHz are now commercially available Q 
(which establishes that issues related to placement of many 
antennas and RF/IF chains can be solved in large aperture 
communication terminals like set-top boxes/laptops), large 
non-orthogonal STBCs (e.g., 16x16 STBC from CDA) in com- 
bination with large dimension near-ML decoding using RTS 
can enable communications at increased spectral efficiencies 
of the order of tens of bps/Hz (note that current standards 
achieve only < 10 bps/Hz using only up to 4 tx antennas). 

Tabu search (TS), a heuristic originally designed to obtain ap- 
proximate solutions to combinatorial optimization problems 
IIBl- lfTTII . is increasingly applied in communication problems 
lfT2l - lfT4l . For e.g., in lfl2l . design of constellation label maps 
to maximize asymptotic coding gain is formulated as a quadra- 
tic assignment problem (QAP), which is solved using RTS 
ifTTl . RTS approach is shown to be effective in terms of BER 
performance and efficient in terms of computational com- 
plexity in CDMA multiuser detection (TJl. In HH, a fixed TS 
based detection in V-BLAST is presented. In this paper, we 
establish that RTS based decoding of non-orthogonal STBCs 
can achieve excellent BER performance (near-ML and near- 
capacity performance) in large dimensions at practically af- 
fordable low-complexities. We also present a stopping-criteri- 
on for the RTS algorithm. RTS for large dimension non- 
orthogonal STBC decoding has not been reported so far. Our 
results in this paper can be summarized as follows: 

• Under i.i.d fading and perfect channel state information 
at the receiver (CSIR), our simulation results show that 
RTS based decoding of 12 x 12 STBC from CDA and 4- 
QAM (288 real dimensions) achieves i) 10~ 3 uncoded 
BER at an SNR of just 0.5 dB away from SISO AWGN 
performance, and ii) a coded BER performance close to 
within about 5 dB of the theoretical capacity using rate- 
3/4 turbo code at a spectral efficiency of 18 bps/Hz. 

• Compared to the LAS algorithm we reported recently in 
H-ID, RTS achieves near-SISO AWGN performance 
with less number of dimensions than with LAS; this is 
achieved at some extra complexity compared to LAS. 

• We report good BER performance when i.i.d fading and 
perfect CSIR assumptions are relaxed by adopting a spa- 
tially correlated MIMO channel model, and a training 
based iterative RTS decoding/channel estimation scheme. 

12 antennas in these products are now used only for beamforming. 
Single-beam multi-antenna approaches can offer range increase and inter- 
ference avoidance, but not spectral efficiency increase. 



The rest of this paper is organized as follows. The non-ortho- 
gonal STBC MIMO system model is presented in Section 
HI1 RTS algorithm for decoding non-orthogonal STBCs and 
the proposed stopping criterion are presented in Section [HI] 
Simulation results including uncoded and coded BER per- 
formance of RTS decoding with i) perfect CSIR, ii) esti- 
mated CSIR using an iterative RTS decoding/channel esti- 
mation scheme, and Hi) effect of spatial correlation are pre- 
sented in Section[IV] Conclusions are given in SectionlVl 

II. Non-Orthogonal STBC MIMO System Model 

Consider a STBC MIMO system with multiple transmit and 
receive antennas. An (n,p, fc) STBC is represented by a ma- 
trix X c G C nxp , where n and p denote the number of transmit 
antennas and number of time slots, respectively, and fc de- 
notes the number of complex data symbols sent in one STBC 
matrix. The (i, j)th entry in X c represents the complex num- 
ber transmitted from the ith transmit antenna in the jth time 
slot. The rate of an STBC is -. Let N r and N t = n denote the 
number of receive and transmit antennas, respectively. Let 
H c G i^N r xNt denote the channel gain matrix, where the 
(i, j)th entry in H c is the complex channel gain from the jth 
transmit antenna to the ith receive antenna. We assume that 
the channel gains remain constant over one STBC matrix and 
vary (i.i.d) from one STBC matrix to the other. Assuming 
rich scattering, we model the entries of H c as i.i.d CM(0, 1). 
The received space-time signal matrix, Y c G C NrXp , can be 
written as 

Y C = H C X C + N C , (1) 

where N c £ £ NrXp is the noise matrix at the receiver and 
its entries are modeled as i.i.d CA/"(0, a 1 = * E * ) , where 
E s is the average energy of the transmitted symbols, and 7 is 
the average received SNR per receive antenna |fT31 , and the 
(i, j)th entry in Y c is the received signal at the ith receive an- 
tenna in the jth time-slot. Consider linear dispersion STBCs, 
where X c can be written in the form (151 

fc 

X c = 5>«A«, (2) 

i=l 

where Xc is the ith complex data symbol, and Ac £ C NtXp 
is its corresponding weight matrix. The received signal model 
in (fTJi can be written in an equivalent V-BLAST form as 

fc 

y c = ]T4 4) (H c a«) + n c = H c x c + n c , (3) 

i=\ 

where y c £ C N - pxl = vec(Y c ), H c £ C N - pxNtP = (I ® 
HcJ.ai* 5 £ C Ntpxl = vec(A^),n c £ C N - pxl = uec(N c ), 
x c £ C kxl whose ith entry is the data symbol x c , and 
H c £ £ NrPXk whose ith column is H c a£\ i = 1, 2, • • • , k. 
Each element of x c is an M-PAM/M-QAM symbol. Let y c , 
H c , x c , n c be decomposed into real and imaginary parts as: 

y c = yi + jyQ, x c = x 7 + j* Q , 

n c = n/ + jn Q , H c = H/ + jU Q . (4) 



Further, we define H r £ ]R 2 ^px 2fc , y r g R 2JV rP xi ; ^ £ 
R 2kxl , and n r £ R 2N -p x1 as 

x r = [xj x^] T , n r = [nj n T Q ] T . (6) 

Now, (O can be written as 

y r = H r x r + n r . (7) 

Henceforth, we work with the real-valued system in (0. For 
notational simplicity, we drop subscripts r in (0 and write 

y = Hx + n, (8) 

where H = H r G R 2N ^ 2k , y = y r € R 2NrPXl , x = x r G 
R 2fcxl , and n = n r G r 2JV --p x1 . We assume that the channel 
coefficients are known at the receiver but not at the transmit- 
ter. Let A = {a q , q = 1,2, • ■ • , M}, where a q = 2q - 1 - M 
denote the M-PAM signal set from which Xi {ith entry of x) 
takes values, i = 0, • • ■ , 2k — 1. The ML solution is given by 

arg min ^ ^ ^ 
dML = d £A^ d H Hd " 2y Hd ' (9) 

whose complexity is exponential in fc. 

A. Full-rate Non-orthogonal STBCs from CDA 

We focus on the decoding of square (i.e., n~p~ N t ), full- 
rate (i.e., k=pn — N 2 ), circulant (where the weight matrices 
a£^'s are permutation type), non-orthogonal STBCs from 
CDA [1|, whose construction for arbitrary number of trans- 
mit antennas n is given by the matrix in Eqn.(9.a) given at 
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the bottom of the next page. In (9. a), w n = e~ , j = v— 1, 
and d u ,v, < u, v < n — lare the n 2 data symbols from a 
QAM alphabet. When S = t = 1, the code in (9. a) is in- 
formation lossless (ILL), and when S = e^' } and t = e>, 
it is of full-diversity and information lossless (FD-ILL) JTJ. 
High spectral efficiencies with large n can be achieved us- 
ing this code construction. However, since these STBCs are 
non-orthogonal, ML detection gets increasingly impractical 
for large n. Consequently, a key challenge in realizing the 
benefits of these large STBCs in practice is that of achieving 
near-ML performance for large n at low decoding complexi- 
ties. The BER performance results we report in Sec. HVIshow 
that the RTS based decoding algorithm we present in the fol- 
lowing section essentially meets this challenge. 

III. RTS Algorithm for Large Non-Orthogonal 
STBC Decoding 

In this section, we present the RTS algorithm, which is an 
iterative local search algorithm, for decoding non-orthogonal 
STBCs. The goal is to get x, an estimate of x, given y and H. 

Neighborhood Definitions: For each vector in the solution 
space, define the neighborhood structure as follows. Symbol 
neighborhood of a signal point a q £ A, q = 1, 2, • • • , M, 
is defined as a set N{a q ) C A — {a q }\ e.g., for 4-PAM, 
A = {—3, —1, 1, 3}, one possible symbol neighborhood struc- 
ture could be Af(-3) = {-1, 1}, Af(-l) = {-3, 1}, Mil) = 



{-1, 3}, Af(3) = {1, -1}. Then, N = \N(a q )\, Vg e {1 ■ • • M} 
is the number of symbol neighbors of a q . Note that the max- 
imum and minimum value TV can take is M — 1 and 1, re- 



spectively. Let x' 



(m) (m) 







•''I 



• X 



(m) 
2k-l 



denote the data 



vector in the mth iteration. We refer to the vector 



as the (u, v)th vector neighbor of x^ m \ u — 0, • • ■ , 2k — 1, 
v = 0, • • ■ , N - 1, if 1) x("') differs from z( m ) (u, v) in the 
Mth coordinate, and 2) the wth element of z( m )(u,w) is the 
vth symbol neighbor of Xu ■ That is, 



u,v) 



x\ m ' for i ^ u 

w v (xu) for i = u, 



(11) 



where w v (a), v = 0, 1, • ■ ■ , N— 1 is the uth element in A/"(a). 
So we will have 2kN vectors which differ from a given vec- 
tor in the solution space in only one coordinate. These 2kN 
vectors form the neighborhood of the given vector. It is noted 
that bit-flipping is a special case with N = 1 and AI = 2. 

The algorithm is said to execute a move (u, v) if x ( m+1 ) = 
z( m )(u,v). The number of candidates to be considered for 
a move in the mth iteration is 2kN. Since the coordinate 
that changes in a move can take M possible values for M- 
PAM, the total number of possible moves is 2kMN. The 
tabu value of a move, which is a non-negative integer, means 
that the move cannot be considered for that many number of 
subsequent iterations, unless certain conditions are satisfied. 

Tabu Matrix: A tabujnatrix of size 2kM x N is the matrix 
whose entries denote the tabu values of moves. The (r, s)th 
entry of the tabujnatrix corresponds to the move (u, v) from 
x ( m ) when u = v = s and Xu = a q , where q = 

mod(r - 1, M) + 1. 

RTS Algorithm: Let g( m ) be the vector which has the least 
ML cost found till the mth iteration of the algorithm. Let 
l rep be the average length (in number of iterations) between 
two successive occurrences of the same solution vector (rep- 
etitions), at the end of an iteration. Tabu period, P, a dy- 
namic non-negative integer parameter, is defined. If a move 
is marked as tabu in an iteration, it will remain as tabu for P 
subsequent iterations. The algorithm starts with an initial so- 
lution vector x(°) , which, for e.g., could be the MMSE or MF 
output vector. Set g (0) = x (0) , l rep = 0, and P = P Q . All 
the entries of the tabujnatrix are set to zero. The following 
steps 1) to 3) are performed in each iteration. Consider mth 
iteration in the algorithm, m > 0. 



Step 1): Define y mf = H T y, R = H T H, and f( TO ) = 
RxH -y m/ . Lete( m )(u,v) = z,^(u,v) - x^" 1 '. The 
ML costs of the 2kN neighbors of x^ m \ namely, z( m ) (u, v), 
u = 0, • • • , 2k — 1; v = 0, • • • , N — 1, are computed as 

(t>(z (m) (u,v)) = (x (m) +e (m) (tt,t;)) T R(x (m) +e (m) (u,w)) 

_2( x < m > + e^(u,v)fy mf 

= <^(x (m) ) + 2(e (m) («,w)) T Rx (m) 

+ (e (m) M) T Re (m) M - 2(e {m) {u,v)) T y mf 

= #x< m >) + 2 e ^>( M , + (ei m \u,v)) 2 R u , v (12) 

= c(eS I m) (u,u)) 

where el"^(u, v) is the uth element of e' Tn '(w, v), /i is 
Mth element of f( m ), and R Ujll is the (u, «)th element of R. 
0(x( m )) on the RHS in ( fT2l can be dropped since it will not 
affect the cost minimization. Let 



(ui,vi) 



argmm ( 



c(4 m W>))- 



(13) 



The move (u\, v\) is accepted if any one of the following two 
conditions is satisfied: 

i) ^H( Uli „ 1 ))<^( g W) 

ii) tabujnatrix((ui — l)M+q,vi) — where q : x^' = a q G A. 
If move (ui,vi) is accepted, then make 

x< m+1 ) = x^ +e ( - m \u 1 ,v 1 ). (14) 

If move (tii, vi) is not accepted (i.e., neither of conditions i) 
and ii) is satisfied), find (u 2 , ^2) such that 

(u 2 ,v 2 ) = "J™ C(etHu,v)), (15) 

and check for acceptance of the (142,^2) move. If this also 
cannot be accepted, repeat the procedure for (143, V3), and so 
on. If all the 2kN moves are tabu, then all the tabujnatrix en- 
tries are decremented by the minimum value in the tabujnatrix; 
this goes on till one of the moves becomes permissible. Let 
[v! , v') be the index of the neighbor with the minimum cost 
for which the move is permitted. The variables q' , q", v" are 



(m) 



= v = «v«7 +iJ ) (see defi 



implicitly defined by 

nition of w v (a) below Eqn.(refeql 1)), and x^ 1 ' = a q », 
where a q i , a q ii E A. 

Step 2: After a move is done, the new solution vector is 
checked for repetition. For the channel model in ([8]), repe- 
tition can be checked by comparing the ML costs of the so- 
lutions in the previous iterations. If there is a repetition, the 
length of the repetition from the previous occurrence is found, 



En — 1 j , 

=0 



12. i 



n— 1 ■ 2i ,i 
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i=0 ao,iW n t 
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(n— 

-'n t 



rv^ n— 1 J (n — l)i ,i 

rV^i-l J (n — l)i ,j 

«3,i C 



(9.a) 



Ei=0 dn-t,i i 
Ei=0 
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3,i Wn * 



c v— vn — 1 1 (? 



(n— l)i i 



the average length, l rep , is updated, and the tabu period P is 
modified as P = P + 1. If the number of iterations elapsed 
since the last change of the value of P exceeds {3l rep , for a 
fixed j3 > 0, make P = P — 1. The minimum value of 
P, however, will be 1 . Note that this step, if executed, also 
qualifies as the one which changed P. After a move (u\ v') 
is accepted, if ^(x( ,n+1) ) < 0(g (m) ), make 

tabujnatrix ((u — V)M + q ,v) = 0, 

tabujnatrix (fv! - l)M + q" \v") = 0, (16) 

andg( m+1 > =x(" l+1 );else, 

tabujnatrix {(u' — 1)M + q', v') = P + 1, 
tabujnatrix ((it' - 1)M + q" ', v") = P + 1, (17) 

and g( m+1 ) = g( m ). 

Step 3): Update the entries of the tabujnatrix as 
tabujnatrix(r,s) = m&x{tabujnatrix (r, s) — 1, 0}, (18) 
for r = 0, • • • , 2kM - 1, s = 0, • • • ,7V — 1. f (m ) is updated as 

f (m+l) = fW +e M( u >')R u; , (19) 

where R u < is the u'th column of R. 

Stopping criterion: The algorithm can be stopped based on a 
fixed number of iterations. Though convergence can be slow 
at low SNRs (typ. hundreds of iterations), it can be fast (typ. 
tens of iterations) at moderate to high SNRs. So rather than 
fixing a large number of iterations to stop the algorithm ir- 
respective of the SNR, we use an efficient stopping criterion 
which makes use of the knowledge of the best ML cost in a 
given iteration, as follows. 

Since the ML criterion is to minimize ||Hx — y|| 2 , the mini- 
mum value of the objective function x T H T Hx — 2x T H T y 
is always greater than — y T y. We stop the algorithm when 
the least ML cost achieved in an iteration is within certain 
range of the global minimum, which is — y T y. We stop the 
algorithm in the mth iteration, if the condition 



l^(g (m) )-(-y T y)l 



y T y| 



(20) 



is met with at least minJter iterations being completed to 
make sure the search algorithm has 'settled.' The bound is 
gradually relaxed as the number of iterations increase and the 
algorithm is terminated when 



l^(g (m) )-(-y T y)l 
|-y T y| 



< mct-i- 



(21) 



In addition, we terminate the algorithm whenever the number 
of repetitions of solutions exceeds maxjrep. Also, the maxi- 
mum number of iterations is set to maxJter. We have found 
that use of the following stopping criterion parameters results 
in low complexity without compromising much on the per- 
formance (compared to a fixed number of iterations of 300) 
for 4-QAM: minJter = 20, maxJter = 300, maxj-ep = 75, 
a! = 0.05, and a 2 = 0.0005. 



IV. Simulation Results 

In this section, we present the uncoded and coded BER per- 
formance of the RTS algorithm in decoding non-orthogonal 
STBCs with 8 = t = 1 (i.e., ILL) and 8 = e^, t = e> (i.e., 
FD-ILL0). The following RTS parameters are used in all the 
simulations: MMSE initial vector, P = 2,/3 = l,0.1,ai = 
5%, a.2 = 0.05%, maxj-ep=75 , maxJter = 300, minJter = 20. 

A. Uncoded BER performance of RTS: 

RTS versus LAS Performance: In Fig. [U we plot the un- 
coded BER of the RTS algorithm as a function of average re- 
ceived SNR per receive antenna, 7 lfT31 . in decoding 4x4 (32 
dimensions), 8x8 (128 dimensions) and 12 x 12 (288 dimen- 
sions) non-orthogonal ILL STBCs for 4-QAM and N t = N r . 
Perfect CSIR and i.i.d fading are assumed. For the same set- 
tings, performance of the LAS algorithm in 0-0 are also 
plotted for comparison. MMSE initial vector is used in both 
RTS and LAS. As a reference, we have plotted the BER per- 
formance on a SISO AWGN channel as well. From Fig. [T] 
the following interesting observations can be made: 

• the BER of the RTS algorithm improves and approaches 
SISO AWGN performance as N t = N r (i.e., STBC size) 
is increased; e.g., performance close to within 0.5 dB 
from SISO AWGN performance is achieved at 10~ 3 un- 
coded BER in decoding 12 x 12 STBC with 288 real 
dimensions. 

• RTS algorithm performs better than LAS algorithm (see 
RTS and LAS BER plots for 4 x 4 and 8 x 8 STBCs). 
Further, while both RTS and LAS algorithms exhibit 
large system behavior (i.e., BER improves as Nt = N r 
is increased), RTS is able to achieve nearness to SISO 
AWGN performance at 10~ 3 BER with less number of 
dimensions than with LAS. This is evident by observing 
that, while LAS requires 512 dimensions (16x16 STBC) 
to achieve 1 dB closeness to SISO AWGN performance 
at 10~ 3 BER, RTS is able to achieve even 0.5 dB close- 
ness with just 288 dimensions (12 x 12 STBC). RTS is 
able to achieve this better performance because, while 
the bit/symbol-flipping strategies are similar in both RTS 
and LAS, the inherent escape strategy in RTS allows it 
to move out of local minimas and move towards better 
solutions. Consequently, RTS incurs some extra com- 
plexity compared to LAS, without increase in the order 
of complexity. 

RTS performance in V-BLAST: A similar observation can be 
made with uncoded BER of RTS detection in V-BLAST in 
Fig. |2]for N t = N r and 4-QAM. From Fig. |2] it is seen that 
LAS requires 128 dimensions (64 x 64 V-BLAST) to achieve 
performance within 1 dB of SISO AWGN performance at 
10~ 3 BER, whereas RTS is able to achieve even better close- 
ness with just 64 dimensions (32 x 32 V-BLAST). In sum- 
mary, the ability to achieve near SISO AWGN performance 
at less dimensions than LAS is an attractive feature of RTS. 

2 Our simulation results show that the BER performance of FD-ILL and 
ILL STBCs with RTS decoding are almost the same. 
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Fig. 1 

UNCODED BER OF RTS DECODING OF 4 X 4, 8 X 8 AND 12 X 12 
NON-ORTHOGONAL STBCS FROM CDA. N t = N r , ILL STBCS 
(S = t = 1), 4-QAM. RTS PARAMETERS: 
p = 2, p = 1, ai = 5%, a 2 = 0.05%, maxjter = 300, minjter = 20. 
RTS achieves near SISO AWGN performance for increasing Nt = N r (i.e., 
STBC size). RTS performs better than LAS. 

B. Turbo coded BER performance of RTS 

Figure [3] shows the rate-3/4 turbo coded BER of RTS decod- 
ing of 12 x 12 non-orthogonal ILL STBC with N t = N r and 
4-QAM (corresponding to a spectral efficiency of 18 bps/Hz), 
under perfect CSIR and i.i.d fading. The theoretical minimum 
SNR required to achieve 18 bps/Hz spectral efficiency on a 
N t = N r = 12 MIMO channel with perfect CSIR and i.i.d 
fading is 4.27 dB (obtained through simulation of the ergodic 
capacity formula lfT5l ). ^From Fig. [5J it is seen that RTS de- 
coding is able to achieve vertical fall in coded BER close to 
within about 5 dB from the theoretical minimum SNR, which 
is good nearness to capacity performance. This nearness to 
capacity can be further improved by 1 to 1.5 dB if soft deci- 
sion values, proposed in J3, are fed to the turbo decoder. 

C. Iterative RTS Decoding/Channel Estimation 

Next, we relax the perfect CSIR assumption by considering 
a training based iterative RTS decoding/channel estimation 
scheme. Transmission is carried out in frames, where one 
Nt x Nt pilot matrix (for training purposes) followed by Nd 
data STBC matrices are sent in each frame as shown in Fig. 
|U One frame length, T, (taken to be the channel coherence 
time) is T = (Nd + l)N t channel uses. The proposed scheme 
works as follows |[T6l : i) obtain an MMSE estimate of the 
channel matrix during the pilot phase, ii) use the estimated 
channel matrix to decode the data STBC matrices using RTS 
algorithm, and Hi) iterate between channel estimation and 
RTS decoding for a certain number of times. For 12 x 12 
ILL STBC, in addition to perfect CSIR performance, Fig. [3] 
also shows the performance with CSIR estimated using the 
above iterative RTS decoding/channel estimation scheme for 
Nd = 8 and Nd = 20. 2 iterations between RTS decoding 
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UNCODED BER OF RTS DETECTION OF V-BLAST WITH JV t = N r AND 
4-QAM. RTS PARAMETERS: P = 2,/3 = 0.1, ai = 5%, a 2 = 
0.05%, maxjter = 300, minjter = 20. RTS achieves near SISO AWGN 
performance for increasing Nt = N r . RTS performs better than LAS. 

and channel estimation are used. With Nd = 20 (which cor- 
responds to large coherence times, i.e., slow fading) the BER 
and bps/Hz with estimated CSIR get closer to those with per- 
fect CSIR. 

D. Effect of MIMO Spatial Correlation 

In Figs. [TJ to [3] we assumed i.i.d fading. But spatial corre- 
lation at transmit/receive antennas and the structure of scat- 
tering and propagation environment can affect the rank struc- 
ture of the MIMO channel resulting in degraded performance 
lfT7l . lfT8l . We relaxed the i.i.d. fading assumption by consid- 
ering the correlated MIMO channel model proposed by Ges- 
bert et al in [[181, which takes into account carrier frequency 
(/ c ), spacing between antenna elements (dt,d r ), distance be- 
tween transmit and receive antennas (R), and scattering envi- 
ronment. In Fig. [5] we plot the uncoded BER of RTS decod- 
ing of 12 x 12 FD-ILL STBC with perfect CSIR in i) i.i.d. 
fading, and ii) correlated MIMO fading model in |[T8l . It is 
seen that, compared to i.i.d fading, there is a loss in diver- 
sity order in spatial correlation for Nt — N r = 12; further, 
use of more receive antennas (iV r = 14, Nt = 12), without 
increase in the receiver aperture, alleviates this loss in perfor- 
mance. Finally, we note that have carried out simulations of 
RTS decoding for 16-QAM as well, where similar results re- 
ported here for 4-QAM are observed. The RTS decoding can 
be used to decode perfect codes |fl9l , l20l of large dimensions 
as well. 

V. Conclusions 

We presented a reactive tabu search based low-complexity al- 
gorithm for decoding high-rate non-orthogonal STBCs hav- 
ing large dimensions that can achieve high spectral efficien- 
cies of the order of tens of bps/Hz. The RTS algorithm was 
shown to achieve near SISO AWGN uncoded BER perfor- 
mance as well as near-capacity turbo coded BER performance 
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Turbo coded BER of RTS decoding of 12 x 12 non-orthogonal 

ILL STBC WITH Nt = N r , 4-QAM, RATE-3/4 TURBO CODE, AND 18 
BPS/HZ. RTS PARAMETERS: 

p = 2, [3 = 1, ai = 5%, at 2 = 0.05%, maxjter = 300, minjter = 20. 
BER of RTS with estimated CSIR approaches close to that with perfect CSIR 
for increasing (i.e., slow fading). 
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Effect of spatial correlation on the performance of RTS 
DECODING OF 12 X 12 FD-ILL STBC WITH Nt = 12, N r = 12, 14, 
4-QAM, RATE-3/4 TURBO CODE, 18 BPS/HZ. CORRELATED MIMO 

CHANNEL PARAMETERS: f c = 5 GHZ, R = 500 M, S = 30, 
D t = D r = 20 M, 9 t =e r = 90°, N r d r = N t d t = 72 CM. Spatial 
correlation degrades achieved diversity order compared to i.i.d. Increasing 
N r alleviates this performance loss. 
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TRANSMISSION SCHEME WITH ONE PILOT MATRIX FOLLOWED BY N d 
DATA STBC MATRICES IN EACH FRAME. 



in non-orthogonal STBC MIMO systems with large dimen- 
sions. The algorithm performed well with estimated CSIR 
using a training-based iterative decoding/channel estimation 
scheme. In addition, the algorithm could perform well in the 
presence of MIMO spatial correlation when more receive di- 
mensions are used. Comparing the performance of RTS algo- 
rithm with LAS algorithm (which we presented recently), we 
pointed out that the ability to achieve near SISO AWGN per- 
formance at less dimensions than LAS is an attractive feature 
of RTS. 
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Abstract — Non-orthogonal space-time block codes (STBC) with 
large dimensions are attractive because they can simultaneously 
achieve both high spectral efficiencies (same spectral efficiency 
as in V-BLAST for a given number of transmit antennas) as well 
as full transmit diversity. Decoding of non-orthogonal STBCs 
with large dimensions has been a challenge. In this paper, we 
present a reactive tabu search (RTS) based algorithm for decod- 
ing non-orthogonal STBCs from cyclic division algebras (CDA) 
having large dimensions. Under i.i.d fading and perfect channel 
state information at the receiver (CSIR), our simulation results 
show that RTS based decoding of 12 x 12 STBC from CDA and 
4-QAM with 288 real dimensions achieves i) 10 -3 uncoded BER 
at an SNR of just 0.5 dB away from SISO AWGN performance, 
and ii) a coded BER performance close to within about 5 dB of 
the theoretical MIMO capacity, using rate-3/4 turbo code at a 
spectral efficiency of 18 bps/Hz. RTS is shown to achieve near 
SISO AWGN performance with less number of dimensions than 
with LAS algorithm (which we reported recently) at some extra 
complexity than LAS. We also report good BER performance of 
RTS when i.i.d fading and perfect CSIR assumptions are relaxed 
by considering a spatially correlated MIMO channel model, and 
by using a training based iterative RTS decoding/channel esti- 
mation scheme. 

I. Introduction 
MIMO systems that employ non-orthogonal space-time block 
codes (STBC) from cyclic division algebras (CDA) for ar- 
bitrary number of transmit antennas, N t , are attractive be- 
cause they can simultaneously provide both full-rate (i.e., Nt 
complex symbols per channel use, which is same as in V- 
BLAST) as well as full transmit diversity [?],[?]. The 2 x 2 
Golden code is a well known non-orthogonal STBC from 
CDA for 2 transmit antennas [?]. High spectral efficiencies 
of the order of tens of bps/Hz can be achieved using large 
non-orthogonal STBCs. For e.g., a 16 x 16 STBC from CDA 
has 256 complex symbols in it with 512 real dimensions; with 
16-QAM and rate-3/4 turbo code, this system offers a high 
spectral efficiency of 48 bps/Hz. Decoding of non-orthogonal 
STBCs with such large dimensions, however, has been a chal- 
lenge. Sphere decoder and its low-complexity variants are 
prohibitively complex for decoding such STBCs with hun- 
dreds of dimensions. Recently, we proposed a low-complexity 
near-ML achieving algorithm to decode large non-orthogonal 
STBCs from CDA; this algorithm, which is based on bit- 
flipping approach, is termed as likelihood ascent search (LAS) 
algorithm [?]-[?]. In this paper, we present a reactive tabu 
search (RTS) based approach to near-ML decoding of non- 
orthogonal STBCs with large dimensions. 

Key attractive features of the proposed RTS based decod- 
ing are its low-complexity and near-ML performance in sys- 
tems with large dimensions (e.g., hundreds of dimensions). 
While creating hundreds of dimensions in space alone (e.g., 
V-BLAST) requires hundreds of antennas, use of non-orthogonal 
STBCs from CDA can create hundreds of dimensions with 
just tens of antennas (space) and tens of channel uses (time). 



Given that 802.11 smart WiFi products with 12 transmit an- 
tennas 1 at 2.5 GHz are now commercially available [?] (which 
establishes that issues related to placement of many anten- 
nas and RF/IF chains can be solved in large aperture com- 
munication terminals like set-top boxes/laptops), large non- 
orthogonal STBCs (e.g., 16 x 16 STBC from CDA) in com- 
bination with large dimension near-ML decoding using RTS 
can enable communications at increased spectral efficiencies 
of the order of tens of bps/Hz (note that current standards 
achieve only < 10 bps/Hz using only up to 4 tx antennas). 

Tabu search (TS), a heuristic originally designed to obtain ap- 
proximate solutions to combinatorial optimization problems 
[?]-[?], is increasingly applied in communication problems 
[?]-[?]. For e.g., in [?], design of constellation label maps to 
maximize asymptotic coding gain is formulated as a quadratic 
assignment problem (QAP), which is solved using RTS [?]. 
RTS approach is shown to be effective in terms of BER per- 
formance and efficient in terms of computational complexity 
in CDMA multiuser detection [?]. In [?], a fixed TS based 
detection in V-BLAST is presented. In this paper, we es- 
tablish that RTS based decoding of non-orthogonal STBCs 
can achieve excellent BER performance (near-ML and near- 
capacity performance) in large dimensions at practically af- 
fordable low-complexities. We also present a stopping-criteri- 
on for the RTS algorithm. RTS for large dimension non- 
orthogonal STBC decoding has not been reported so far. Our 
results in this paper can be summarized as follows: 

• Under i.i.d fading and perfect channel state information 
at the receiver (CSIR), our simulation results show that 
RTS based decoding of 12 x 12 STBC from CDA and 4- 
QAM (288 real dimensions) achieves i) 10 -3 uncoded 
BER at an SNR of just 0.5 dB away from SISO AWGN 
performance, and ii) a coded BER performance close to 
within about 5 dB of the theoretical capacity using rate- 
3/4 turbo code at a spectral efficiency of 18 bps/Hz. 

• Compared to the LAS algorithm we reported recently in 
[?]-[?], RTS achieves near-SISO AWGN performance 
with less number of dimensions than with LAS; this is 
achieved at some extra complexity compared to LAS. 

• We report good BER performance when i.i.d fading and 
perfect CSIR assumptions are relaxed by adopting a spa- 
tially correlated MIMO channel model, and a training 
based iterative RTS decoding/channel estimation scheme. 

II. Non-Orthogonal STBC MIMO System Model 

Consider a STBC MIMO system with multiple transmit and 
receive antennas. An (n,p, k) STBC is represented by a ma- 

12 antennas in these products are now used only for beamforming. 
Single-beam multi-antenna approaches can offer range increase and inter- 
ference avoidance, but not spectral efficiency increase. 



trix X c e C nxp , where n and p denote the number of transmit 
antennas and number of time slots, respectively, and k de- 
notes the number of complex data symbols sent in one STBC 
matrix. The (i, j)th entry in X c represents the complex num- 
ber transmitted from the ith transmit antenna in the jth time 
slot. The rate of an STBC is -. Let N r and N t = n denote the 
number of receive and transmit antennas, respectively. Let 
H c G C NrXNt denote the channel gain matrix, where the 
(i, j)th entry in H c is the complex channel gain from the jth 
transmit antenna to the ith receive antenna. We assume that 
the channel gains remain constant over one STBC matrix and 
vary (i.i.d) from one STBC matrix to the other. Assuming 
rich scattering, we model the entries of H c as i.i.d CA/"(0, 1). 
The received space-time signal matrix, Y c e C NrXp , can be 

Writt6naS Y C = H C X C + N CI (1) 

where N c G C NrXp is the noise matrix at the receiver and 
its entries are modeled as i.i.d CA/"(0, a 2 = Nt ^ s ) , where E s 
is the average energy of the transmitted symbols, and 7 is the 
average received SNR per receive antenna [?], and the (i, j)th 
entry in Y c is the received signal at the ith receive antenna in 
the jth time-slot. Consider linear dispersion STBCs, where 
X c can be written in the form [?] 

k 

X c = 5>«A«, (2) 

where Xc is the ith complex data symbol, and G C^* xp 
is its corresponding weight matrix. The received signal model 
in (??) can be written in an equivalent V-BLAST form as 



^^(Hc^J+nc = H c x c + n c , 



(3) 



where y c G C N - pxl = vec(Y c ), H c G C N - pxNtP = (I ® 
H c ),a c ' i} G C Ntpxl = vec(A c 4) ),n c G C N - pxl = vec(N c ), 
x c G C fcxl whose ith entry is the data symbol Xc , and 
H c g C NrPXk whose ith column is H c a c l) , i = 1, 2, • • • , k. 
Each element of x c is an M-PAM/M-QAM symbol. Let y c , 
H c , x c , n c be decomposed into real and imaginary parts as: 
Yc = yi + jyg, x c = xj + jx Q , 

n c = n/+jnQ, H c = H/+jH Q . (4) 
Further, we define H r G R 2N r P x2k > y r e R 2N rP xi > ^ g 
R 2fcxl ,and n r e K^N rP xi as 



H, 



H 



Q 



H, 



TlT 



x r = [xj x Q ] , n r = [n / n Q ] 



TiT 



(5) 
(6) 



Now, (??) can be written as 

y r = H r x r + n r . (7) 
Henceforth, we work with the real-valued system in (??). For 
notational simplicity, we drop subscripts r in (??) and write 

y = Hx + n, (8) 
where H = H r e M. 2N - px2k , y = y r G R^x 1 , x = Xr e 
R 2fexl , and n = n r G R 2JVrPXl . We assume that the channel 
coefficients are known at the receiver but not at the transmit- 
ter. Let A = {ciq, q = 1, 2, • • • , M}, where a q = 2q — 1 — M 
denote the M-PAM signal set from which Xi (ith entry of x) 
takes values, i = 0, • • • ,2k — 1. The ML solution is given by 

arg min ^ ^ ^ 
d M L = , r , 2k d T H T Hd 2y T Hd, (9) 

d G A 2fe 

whose complexity is exponential in k. 



A. Full-rate Non-orthogonal STBCs from CDA 

We focus on the decoding of square (i.e., n=p = Nt), full- 
rate (i.e., k = pn = N 2 ), circulant (where the weight ma- 
trices Ac s are permutation type), non-orthogonal STBCs 
from CDA [?], whose construction for arbitrary number of 
transmit antennas n is given by the matrix in Eqn.(9.a) given 
at the bottom of this column. In (9. a), u> n = e ™ , j = V— 1, 
and d u ,v, < u, v < n — lare the n 2 data symbols from a 
QAM alphabet. When 8 = t = 1, the code in (9. a) is in- 
formation lossless (ILL), and when <5 = e^' } and t = ej, 
it is of full-diversity and information lossless (FD-ILL) [?]. 
High spectral efficiencies with large n can be achieved us- 
ing this code construction. However, since these STBCs are 
non-orthogonal, ML detection gets increasingly impractical 
for large n. Consequently, a key challenge in realizing the 
benefits of these large STBCs in practice is that of achieving 
near-ML performance for large n at low decoding complexi- 
ties. The BER performance results we report in Sec. ?? show 
that the RTS based decoding algorithm we present in the fol- 
lowing section essentially meets this challenge. 

III. RTS Algorithm for Large Non-Orthogonal 

STBC Decoding 
In this section, we present the RTS algorithm for decoding 
non-orthogonal STBCs. The goal is to get x, an estimate of 
x, given y and H. 

Neighborhood Definitions: For each vector in the solution 
space, define the neighborhood structure as follows. Symbol 
neighborhood of a signal point a q G A, q = 1, 2, • • • ,M, 
is defined as the set N{a q ) C A — {a q }; e.g., for 4-PAM, 

A = {-3, -1, 1, 3}, and Af{-3) = {-1, 1}, AT(1) = {-1, 3}, 

and so on. Then, N = \JV(a q )\, Vg G {1 • ■ • M} is the num- 
ber of symbol neighbors of a q . Note that the maximum and 
minimum value N can take is M — 1 and 1, respectively. Let 
x (m) _ [xq"^ ■ ■ ■ x^lj] denote the data vector in the 
mth iteration. We refer to the vector 



(u,v) = [4 m) (u,v) z[ m> (u,v) ■■■ («,«)], (10) 

as the (u, v)th vector neighbor of x^™- 1 , u = 0, • ■ • , 2k — 1, 
v = 0, • • • , N - 1, if 1) x< r ") differs from z< m ) (u, v) in the 
uth coordinate, and 2) the uth element of z,( m ' (u, v) is the 
vth symbol neighbor of . That is, 

c^" 1 -* for i =/= u 



( m ) 1 \ 



1 ( m ) > 

W v (Xu 



for i 



(ID 



where w v (a), v = 0, 1, • ■ • , A^ — 1 is the vth element in Af(a). 
So we will have 2fcA^ vectors which differ from a given vec- 
tor in the solution space in only one coordinate. These 2feA^ 
vectors form the neighborhood of the given vector. It is noted 
that bit-flipping is a special case with N = 1 and M = 2. 



Ei = < 



6 T,™ = q d 2 

6 E™Tn d 3 



5E" 



(9.a) 



Tabu Matrix: A tabujnatrix of size 2kM x N with non- 
negative integer entries is created; this matrix will contain the 
tabu information for all the moves in the search. A non-zero 
entry in the tabujnatrix means that the corresponding move 
is a tabu. 

RTS Algorithm: Let g( m ) be the vector which has the least 
ML cost found till the mth iteration of the algorithm. Let 
l rep be the average length (in number of iterations) between 
two successive occurrences of the same solution vector (rep- 
etitions), at the end of an iteration. Tabu period, P, a dy- 
namic non-negative integer parameter, is defined. If a move 
is marked as tabu in an iteration, it will remain as tabu for P 
subsequent iterations. The algorithm starts with an initial so- 
lution vector x(°) , which, for e.g., could be the MMSE or MF 
output vector. Set g (0) = x (0) , l rep = 0, and P = P . All 
the entries of the tabujnatrix are set to zero. The following 
steps 1) to 3) are performed in each iteration. Consider mth 
iteration in the algorithm, m > 0. 

Step 1): Define y m/ = H T y, R = H T H, and f( m ) = 
RjcW - y m/ . Let e^ m \u,v) = ^ m \u,v) - i6 m \ The 
ML costs of the 2kN neighbors of x^ m ^, namely, z^ m \u, v), 
u = 0, • • • , 2k — 1, v = 0, • • ■ , N — 1, are computed as 

cf>(z {m) (u,v)) = (x (m) +e (m) (M,u)) T R.(x (m) +e {m \u,v)) 

-2(x (m) +e (m) M) T y m/ 

= ^(x (m) ) + 2(e (m) (u,u)) T Rx (m) 

+ (e (m) (u,v)) T Re (m) (u,v)-2(e < - m) (u,v)) T y mf 

= ^ (m) ) + 2e' u m) (u,v)fi m) + (ei m) (u,,)) 2 R, 



(12) 



where ^^(u, v) is the uth element of e' m ^(u,v), is 
uth element of f ( m ), and R Mjtl is the (it, u)th element of R. 
</>(x( m )) on the RHS in (??) can be dropped since it will not 
affect the cost minimization. Let 

argmin < m) 

n ii v ' 



(Ui,Vi) 



(13) 



The move (u\,Vi) is accepted if any one of the following two 
conditions is satisfied: 

i) (t>{z^(u,v)) < 0(g (m) ) 

ii) tabujnatrix^Ux — ^M+qjVi) — where q : Xu = a q G A. 
If move (tii, V\) is accepted, then make 

x (m+l) = x (m) +e (« 1 ,i; 1 ). (14) 

If move (ui, vi) is not accepted (i.e., neither of conditions i) 
and ii) is satisfied), find (u2, V2) such that 



(112,^2) 



C{eT>(u,v)), 



(15) 



arg min 

u, v : u ^ in, v 7^ wi 

and check for acceptance of the (^2,^2) move. If this also 
cannot be accepted, repeat the procedure for (^3, 1*3), and so 
on. If all the 2kN moves are tabu, then all the tabujnatrix en- 
tries are decremented by the minimum value in the tabujnatrix; 
this goes on till one of the moves becomes permissible. Let 
(it', v') be the index of the neighbor with the minimum cost 
for which the move is permitted. Letx^ = a q i = w v i, (a^7 +1) ) 
q ii, where a q >,a q » e A. 



Step 2: After a move is done, the new solution vector is 
checked for repetition. For the channel model in (??), rep- 
etition can be checked by comparing the ML costs of the so- 
lutions in the previous iterations. If there is a repetition, the 
length of the repetition from the previous occurrence is found, 
and the average length, l rep , is updated. The tabu period P is 
modified as P = P + 1. If the number of iterations elapsed 
since the last change of the value of P exceeds (3l rep , for a 
fixed (3 > 0, make P = P — 1. The minimum value of P, 
however, will be 1 . Note that this step, if executed, also qual- 
ifies as the one which changed P. After a move (u' , v') is 
accepted, make 

tabujnatrix ((u — 1)M + q , v ) = P + 1, 
tabujnatrix ((u - 1)M + q" , v") = P + 1, (16) 
anc j g (m+i) = g M, However, if 0(x( m+1 )) < 0(g (m) ), 
then make 

tabujnatrix ((u — 1)M + q , v') = 0, 
tabujnatrix ((u - l)M + q",v") = 0, (17) 
and g(" l+1 ) = x (" l+1 ). 

Step 3): Update the entries of the tabujnatrix as 
tabujnatrix (r 1 s) = m&x{tabujnatrix (r, s) — 1,0}, (18) 
for r = 0, • • • , 2kM - 1, s = 0, • • • ,N-1. f (m) is updated as 

(19) 

where R u < is the u'th column of R. 

Stopping criterion: The algorithm can be stopped based on a 
fixed number of iterations. Though convergence can be slow 
at low SNRs (typ. hundreds of iterations), it can be fast (typ. 
tens of iterations) at moderate to high SNRs. So rather than 
fixing a large number of iterations to stop the algorithm ir- 
respective of the SNR, we use an efficient stopping criterion 
which makes use of the knowledge of the best ML cost in a 
given iteration, as follows. 

Since the ML criterion is to minimize ||Hx — y|| 2 , the mini- 
mum value of the objective function x T H T Hx — 2x T H T y, 



f (m+l) = f (™) + 4? V>' 



x e 



fil- 



ls equal to — y y. We stop the algorithm when 
the least ML cost achieved in an iteration is within certain 
range of the global minimum, which is — y T y. We stop the 
algorithm in the mth iteration, if the condition 

l<Mg (m) )-(-y T y)l 



y T y| 



< ai, 



(20) 



is met with at least minJter iterations being completed to 
make sure the search algorithm has 'settled.' The bound is 
gradually relaxed as the number of iterations increase and the 
algorithm is terminated when 



|0( g (m))_(. 



y T y) 



y T y| 



< ma2. 



(21) 



and xl, 



In addition, we terminate the algorithm whenever the number 
of repetitions of solutions exceeds maxjrep. Also, the maxi- 
mum number of iterations is set to maxJter. We have found 
that use of the following stopping criterion parameters results 
in low complexity without compromising much on the per- 
formance (compared to a fixed number of iterations of 300) 
for 4-QAM: minJter = 20, maxJter = 300, maxsep = 75, 
ai = 0.05, and a 2 = 0.0005. 



IV. Simulation Results 
In this section, we present the uncoded/coded BER perfor- 
mance of the RTS algorithm in decoding non-orthogonal STBCs 
with 5 = t = 1 (i.e., ILL) and 6 = e^ j , t = (i.e., 
FD-ILL 2 ). The following RTS parameters are used in all the 
simulations: MMSE initial vector, P = 2,/3 = l,0.1,ai = 
5%, Q2 = 0.05%, max_rep=75 , maxJter = 300, minjter = 20. 

A. Uncoded BER performance of RTS: 

RTS versus LAS Performance: In Fig. ??, we plot the un- 
coded BER of the RTS algorithm as a function of average re- 
ceived SNR per receive antenna, 7 [?], in decoding 4x4 (32 
dimensions), 8x8 (128 dimensions) and 12 x 12 (288 dimen- 
sions) non-orthogonal ILL STBCs for 4-QAM and N t = N r . 
Perfect CSIR and i.i.d fading are assumed. For the same set- 
tings, performance of the LAS algorithm in [?]-[?] are also 
plotted for comparison. MMSE initial vector is used in both 
RTS and LAS. As a reference, we have plotted the BER per- 
formance on a SISO AWGN channel as well. From Fig. ??, 
the following interesting observations can be made: 

• the BER of RTS algorithm improves and approaches SISO 
AWGN performance as N t - N r (i.e., STBC size) is in- 
creased; e.g., performance close to within 0.5 dB from 
SISO AWGN performance is achieved at 10 -3 uncoded 
BER in decoding 12 x 12 STBC with 288 real dimensions. 

• RTS algorithm performs better than LAS algorithm (see 
RTS and LAS BER plots for 4 x 4 and 8 x 8 STBCs. 
Further, while both RTS and LAS algorithms exhibit 
large system behavior (i.e., BER improves as N t = N r 
is increased), RTS is able to achieve nearness to SISO 
AWGN performance at 1CP 3 BER with less number of 
dimensions than with LAS. This is evident by observing 
that, while LAS requires 512 dimensions (16x16 STBC) 
to achieve 1 dB closeness to SISO AWGN performance 
at 1(T 3 BER, RTS is able to achieve even 0.5 dB close- 
ness with just 288 dimensions (12 x 12 STBC). RTS is 
able to achieve this better performance because, while 
the bit/symbol-flipping strategies are similar in both RTS 
and LAS, the inherent escape strategy in RTS allows it 
to move out of local minimas and move towards better 
solutions. Consequently, RTS incurs some extra com- 
plexity compared to LAS, without increase in the order 
of complexity. 

RTS performance in V-BLAST: A similar observation can be 
made with uncoded BER of RTS detection in V-BLAST in Fig. 
?? for Nt = N r and 4-QAM. From Fig. ??, it is seen that LAS 
requires 128 dimensions (64x64 V-BLAST) to achieve per- 
formance within 1 dB of SISO AWGN performance at 10~ 3 
BER, whereas RTS is able to achieve even better closeness 
with just 64 dimensions (32 x 32 V-BLAST). In summary, 
the ability to achieve near SISO AWGN performance at less 
dimensions than LAS is an attractive feature of RTS. 

B. Turbo coded BER performance of RTS 

Figure ?? shows the rate-3/4 turbo coded BER of RTS decod- 
ing of 12 x 12 non-orthogonal ILL STBC with N t = N r and 
4-QAM (corresponding to a spectral efficiency of 18 bps/Hz), 

2 Our simulation results show that the BER performance of FD-ILL and 
ILL STBCs with RTS decoding are almost the same. 
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Fig. 1. Uncoded BER of RTS decoding of 4 X 4, 8 X 8 and 12 X 12 non- 
orthogonal STBCs from CDA. N t = N r , ILL STBCs (5 = t = 1), 4-QAM. 
RTS parameters: Prj = 2, /3 = 1, a\ = 5%,«2 = 0.05%, maxJter = 
300, minjter = 20. RTS achieves near SISO AWGN performance for in- 
creasing Nt = N r (i.e., STBC size). RTS performs better than LAS. 
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Fig. 2. Uncoded BER of RTS detection of V-BLAST with Nt = N r 
and 4-QAM. RTS parameters: P = 2,/3 = 0.1, 01 = 5%, 02 = 
0.05%, maxJter = 300, minjter = 20. RTS achieves near SISO AWGN 
performance for increasing Nt = N r . RTS performs better than LAS. 

under perfect CSIR and i.i.d fading. The theoretical mini- 
mum SNR required to achieve 18 bps/Hz spectral efficiency 
onaJV, = iV r = 12 MIMO channel with perfect CSIR and i.i.d 
fading is 4.27 dB (obtained through simulation of the ergodic 
capacity formula [?]). From Fig. ??, it is seen that RTS de- 
coding is able to achieve vertical fall in coded BER close to 
within about 5 dB from the theoretical minimum SNR, which 
is good nearness to capacity performance. This nearness to 
capacity can be further improved by 1 to 1 .5 dB if soft deci- 
sion values, proposed in [?], are fed to the turbo decoder. 

C. Iterative RTS Decoding/Channel Estimation 
Next, we relax the perfect CSIR assumption by considering 
a training based iterative RTS decoding/channel estimation 
scheme. Transmission is carried out in frames, where one 
Nt x Nt pilot matrix (for training purposes) followed by N4 
data STBC matrices are sent in each frame as shown in Fig. 
??. One frame length, T, (taken to be the channel coher- 
ence time) is T = (N^ + l)N t channel uses. The proposed 



